refactor(clp-s)!: Rework command line argument parsing for output destinations to improve extensibility.#2229
Conversation
…multiple output handlers.
…parate options related to a given subcommand.
WalkthroughOutput handler configuration moved from many individual getters to typed option structs; CLI parsing now groups trailing tokens into per-handler subcommands and requires exactly one handler. Aggregation flags were relocated into reducer-specific options and validation; runtime code and command assembly updated to use the new option model. ChangesOutput handler & CLI parsing
Build Configuration
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI (argv)
participant Parser as CommandLineArguments
participant App as clp-s
participant Search as kv_ir_search
participant Job as fs_search_task.py
CLI->>Parser: provide argv (including trailing handler tokens)
Parser->>Parser: split trailing tokens into handler subcommand args
Parser-->>App: return selected OutputHandlerOptions (exactly one)
App->>Search: initiate search with selected handler type
Search-->>App: responds with aggregation support check (uses handler type)
App->>Job: when dispatching reducer jobs, pass reducer args then append aggregation flags
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Ruff (0.15.12)components/job-orchestration/job_orchestration/executor/query/fs_search_task.pyUnexpected Ruff issue shape at index 6 Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/core/src/clp_s/CommandLineArguments.cpp`:
- Around line 184-211: The loop currently defaults
get_max_tokens_for_option(...).value_or(0) to 0 which lets an option with
unknown arity cause the next token (which might match a subcommand name) to be
misinterpreted as a subcommand; change the logic in the branch that sets
remaining_tokens_in_current_argument so that when
get_max_tokens_for_option(current_subcommand_options_description, option)
returns std::nullopt you handle it explicitly (e.g., throw
std::invalid_argument("unrecognized option \"...\"") or otherwise treat it as an
error) instead of silently defaulting to 0; update code around
remaining_tokens_in_current_argument, current_subcommand,
current_subcommand_arguments and collect_subcommand to rely on that explicit
handling.
- Around line 993-995: The thrown std::invalid_argument in
CommandLineArguments.cpp currently says "clp-s only supports one output handler
at a time"; change this to include the actual detected handlers from
output_options_map so users see which handlers were passed. Locate the check
using output_options_map (the one that throws when output_options_map.size() >
1) and build a descriptive message by joining the map keys / handler names
(e.g., iterate output_options_map and collect the keys into a comma-separated
string) and include that string in the exception text so the thrown message
shows the list of detected handlers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c84e61f7-19bf-4f9e-90b9-3b0851f0bfab
📒 Files selected for processing (6)
components/core/src/clp_s/CMakeLists.txtcomponents/core/src/clp_s/CommandLineArguments.cppcomponents/core/src/clp_s/CommandLineArguments.hppcomponents/core/src/clp_s/clp-s.cppcomponents/core/src/clp_s/kv_ir_search.cppcomponents/job-orchestration/job_orchestration/executor/query/fs_search_task.py
💤 Files with no reviewable changes (1)
- components/core/src/clp_s/CMakeLists.txt
| if (output_options_map.size() > 1) { | ||
| throw std::invalid_argument("clp-s only supports one output handler at a time"); | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider clarifying the single-handler constraint.
The error message could be more helpful by listing which handlers were detected. This would aid debugging when users accidentally specify multiple handlers.
💡 Optional enhancement for better error diagnostics
if (output_options_map.size() > 1) {
- throw std::invalid_argument("clp-s only supports one output handler at a time");
+ std::string handlers;
+ for (auto const& [name, _] : output_options_map) {
+ if (false == handlers.empty()) {
+ handlers += ", ";
+ }
+ handlers += name;
+ }
+ throw std::invalid_argument(fmt::format(
+ "clp-s only supports one output handler at a time, but found: {}",
+ handlers
+ ));
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/core/src/clp_s/CommandLineArguments.cpp` around lines 993 - 995,
The thrown std::invalid_argument in CommandLineArguments.cpp currently says
"clp-s only supports one output handler at a time"; change this to include the
actual detected handlers from output_options_map so users see which handlers
were passed. Locate the check using output_options_map (the one that throws when
output_options_map.size() > 1) and build a descriptive message by joining the
map keys / handler names (e.g., iterate output_options_map and collect the keys
into a comma-separated string) and include that string in the exception text so
the thrown message shows the list of detected handlers.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@components/job-orchestration/job_orchestration/executor/query/fs_search_task.py`:
- Around line 154-155: The code currently appends "--count" when
aggregation_config.do_count_aggregation is not None, which incorrectly includes
False; change the guard to only append the flag for a truthy value (e.g., use if
aggregation_config.do_count_aggregation: or if
aggregation_config.do_count_aggregation is True:) so that --count is only added
when do_count_aggregation is true; update the check around the
command.append("--count") in the code that constructs the command (refer to
aggregation_config.do_count_aggregation and the command list where
command.append("--count") is called).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: d58fee1a-8c86-4133-bfe9-aded68e6283d
📒 Files selected for processing (2)
components/core/src/clp_s/CMakeLists.txtcomponents/job-orchestration/job_orchestration/executor/query/fs_search_task.py
💤 Files with no reviewable changes (1)
- components/core/src/clp_s/CMakeLists.txt
| if aggregation_config.do_count_aggregation is not None: | ||
| command.append("--count") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Find aggregation config model definition =="
rg -n -C3 'do_count_aggregation|count_by_time_bucket_size|class .*Aggregation' --type=py
echo
echo "== Find assignments/usages that may pass False =="
rg -n -C2 'do_count_aggregation\s*=' --type=py
echo
echo "== Find command construction sites for --count =="
rg -n -C3 '"--count"|\'\''--count'\''' --type=pyRepository: y-scope/clp
Length of output: 11193
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Check for explicit False assignments to do_count_aggregation =="
rg -n 'do_count_aggregation\s*[:=]\s*False' --type=py
echo
echo "== Check instantiation patterns for AggregationConfig =="
rg -n -A2 'AggregationConfig(' --type=py | head -100
echo
echo "== Check the full context of where --count flag is constructed =="
rg -n -B5 -A1 'append.*--count' --type=pyRepository: y-scope/clp
Length of output: 122
The --count flag guard should check truthiness, not non-None.
Line 154 currently adds --count whenever do_count_aggregation is not None, which includes when it is explicitly False. Since the field is typed as bool | None, False is a valid distinct state, and the flag should only be appended for truthy values.
Proposed fix
- if aggregation_config.do_count_aggregation is not None:
+ if aggregation_config.do_count_aggregation:
command.append("--count")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if aggregation_config.do_count_aggregation is not None: | |
| command.append("--count") | |
| if aggregation_config.do_count_aggregation: | |
| command.append("--count") |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@components/job-orchestration/job_orchestration/executor/query/fs_search_task.py`
around lines 154 - 155, The code currently appends "--count" when
aggregation_config.do_count_aggregation is not None, which incorrectly includes
False; change the guard to only append the flag for a truthy value (e.g., use if
aggregation_config.do_count_aggregation: or if
aggregation_config.do_count_aggregation is True:) so that --count is only added
when do_count_aggregation is true; update the check around the
command.append("--count") in the code that constructs the command (refer to
aggregation_config.do_count_aggregation and the command list where
command.append("--count") is called).
LinZhihao-723
left a comment
There was a problem hiding this comment.
I don't have comments on the parsing logic. I think how we store the parsed options is the most important decision to make in this PR, since it affects how C-APIs will be written.
One way is to do the variant thing I mentioned in another comment.
The other way is we expose different entry points for search, each takes a particular config option type and initializes the output handler accordingly.
I think for now, it makes more sense to use the variant solution as it keeps the C++ version cleaner: one config variant variable instead of one variant enum + four config objects stored inside the master config.
There was a problem hiding this comment.
From the API design perspective (to make it future-proof), I think it might make sense if we:
- Make the output handler config a variant of the four different types.
- The master config only contains one variant variable, and it must be one of the four.
- Replace the switch statement with a variant visitor.
There was a problem hiding this comment.
Didn't review the coding-style thing... the functionality looks good to me.
Description
This PR reworks how we parse arguments for output destinations during search in order to make it easy to support having multiple output destinations in a follow-up PR.
Each output destination is treated like a subcommand, e.g.
<output-destination-type> <output-destination-arguments>. Previously, when supporting only one output destination, we would simply collect the unrecognized options after parsing the main search options and hand those options to an options_description specific to the relevant output destination. With multiple output destinations though we need to do a little bit of extra work to figure out which unrecognized options are related to each subcommand.This PR solves that problem by parsing the unrecognized options left to right, using both the known output destination names and the
options_descriptionobject for each subcommand to attempt to find the options for each output destination. After this step the options can be parsed with each destination-specificoptions_descriptionlike normal. We can't simply iteratively use eachoptions_descriptionwithallow_unregisteredto parse the full list of options, because different output destinations share options with the same name (e.g.--hostis used both byreducerandnetwork), so some combinations of output destinations could result in throwing errors because of repeated options. As well, we can't simply rely on splitting up the options for each subcommand based solely on the ouptut destination names, because the output destination names can also appear as arguments (e.g. in the package, because of the way our DNS is set up, a reducer query will look likeclp-s s <archive> <query> reducer --host reducer --port <port> --job-id <id> --count.This PR also refactors the
CommandLineArgumentsclass to group together options for each output destination into structs.This PR includes a small functional change, in that aggregation options are now part of the reducer output destination options. This changes how aggregation options must be specified on the command line (i.e. they explicitly need to be part of the reducer output destination arguments), but doesn't represent any change in functionality since aggregation options are only allowed for the reducer output destination anyway.
Checklist
breaking change.
Validation performed
Summary by CodeRabbit
Bug Fixes
Refactor